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A Hidden Markov Model (HMM) is a common statistical model which is widely used for 
y^ . analysis of biological sequence data and other sequential phenomena. In the present paper 

we show how HMMs can be extended with side-constraints and present constraint solv- 
ing techniques for efhcient inference. Defining HMMs with side-constraints in Constraint 
Logic Programming have advantages in terms of more compact expression and prun- 
ing opportunities during inference. We present a PRISM-based framework for extending 
HMMs with side-constraints and show how well-known constraints such as cardinality 

Cn ■ and all_dif f erent are integrated. We experimentally validate our approach on the bio- 

^ ' logically motivated problem of global pairwise alignment. 
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1 Introduction 

Hidden Markov Models (HMMs) are one of the most popular models for analysis 
of sequential processes taking place in a random way, where "randomness" may 
also be an abstraction covering the fact that a detailed analytical model for the 
internal matters is unavailable. Such a sequential process can be observed from 
outside by its emission sequence (letters, sounds, measures of features, all kinds 
of signals) produced over time, and an HMM postulates a hypothesis about the 
internal machinery in terms of a finite state automaton equipped with probabil- 
ities for the different state transitions and single emissions. A common inference 
for a given observed sequence means to compute the "best" state transitions that 
the HMM may go through to produce the sequence, and thus this represents a 
best hypothesis for the internal structure or "content" of the sequence. HMMs are 
widely used in speech recognition and biological sequence analysis (jRabiner 1989( 
IDurbinet al. 1998| . 
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The efficiency of computations on HMMs heavily depends on the Markov prop- 
erty. Decisions made during a process run depends only on a limited past. Dynamic 
programming algorithms, such as Viterbi and Forward-Backward, are then used to 
perform efficient inference. However, many problems would require more complex 
dependencies among elements of the process. For example, it may be interesting 
to constrain an HMM to visit only different states or limit the number of visits to 
a given state. It is possible to model the all_dif f erent constraint for the states 
visited by extending the underlying finite state automaton, but for the price of 
a factorial number of new states and with an obvious impact on inference. As an 
alternative to modifying the HMM structure, we instead extend the HMM with side- 
constraints dSato and Kameya 2008| IRoth and Yih 2005p . However, classical algo- 
rithms, such as Viterbi, must be modified to take care about these side-constraints 
dChang et al. 2008| IChristiansen et al. 2009|) . 

In this paper, we extend HMMs with side-constraints, leading to what wc call 
Constrained HMMs (CHMMs). Side-constraints are external constraints declared 
in addition to those defined by the structure of an HMM. The concept of CHMMs 
was introduced by Sato et al. in ( |Sato and Kameya 2008[ ), although earlier and 
unrelated systems have used the same or similar names (discussed in section ^ . 
The contribution of this paper is to define CHMMs as constraint logic programs 
extended with probabilistic choices and to show how to employ this setting for more 
efficient Viterbi computation, i.e., computation of the most probable explanation 
of an observation. Moreover, defining HMMs with side-constraints in Constraint 
Logic Programming have advantages in terms of more compact expression and 
pruning opportunities during inference. We show how to implement CHMMs in 



PRISM ( Sato and Kameya 1997) and how to integrate well-known constraints, such 
as cardinality and all_diff erent, into this framework. We validate our approach 
experimentally on the biologically motivated problem of global pairwise alignment. 
The paper is organized as follows: section [2] describes background on HMMs. In 
section|3l wc formally introduce the constraint model associated with a CHMM. Sec- 
tion |4] describes our PRISM-based framework to define CHMMs. Section [5] presents 
an experimental validation. Finally, sections [6] and [7] present related work and con- 
clusions. 

2 Background 

Here we define Hidden Markov Models (HMM)s and illustrate their application to 
the problem of pairwise global alignment. 



2.1 Hidden Markov Models 

For simplicity of the technical definitions, wc limit ourselves to a discrete Hidden 
Markov Model with a distinguished initial state. 

Definition 2.1 

A Hidden Markov Model (HMM) is a 4-tuple (5, A,T,^), where 
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• S* = {sq, si, . . . , Sm} is a set of states which includes an initial state referred 
to as So; 

• A ^ {ei, 62, . . . , efc} is a finite set of emission symbols; 

• T = {(p(so; si), . . . ,p(so; Sm)), • ■ • , (j>{sm;si), . . . ,p{sm; Sm))} IS a Set of tran- 
sition probability distributions representing probabilities to transit from one 
state to another; 

• E = {(p(si;ei),...,p(si;eA;)),...,(p(s„;ei),...,p(s™;efc))} is a set of emis- 
sion probability distributions representing probabilities to emit each symbol 
from each state. 

We define a run of an HMM as a pair consisting of a sequence of states s'"-' s^^' ... s'"^ , 
called a path and a corresponding sequence of emissions e^^' . . . e'"', called an ob- 
servation^ such that 

• ^^°' = ^°' . . 

• Vi,0 < z < n- l,p(s(*);s(*+^)) > (probability to transit from s^ to s(*+^)); 

• Vi,0 < z < ri,,p(s(*^;e^*^) > (probability to emit e^'' from s'^^). 

The probability of such a run is defined as nj:=i n^C'*^*^"'^''' ■5^*^) • p(s'-'-'; e*^*^). 



2.2 An example HMM: pairwise global alignment 

As an example of an HMM that we later extend with constraints, wc consider the 
problem of aligning two sequences. Sequence alignment is among the most common 
tasks in computational biology, where it is used to align sequences assumed to 
have diverged from a common ancestor. Notice that we here use a so-called pair 
HMM (jDurbin et al. 1998|) which emits two sequences at the same time, and which 
is a straightforward extension of the definition above. 

In the global alignment problem, two sequences x and y must be aligned opti- 
mally, based on a scoring scheme for comparison of different alignments. In proba- 
bilistic modeling, a probability is associated with each pair of symbols emitted from 
a state and similarly a probability for introducing gaps, S, and extending gaps, e, in 
the alignment of the sequences is defined. The probability of an alignment is then 
the product of probabilistic transitions performed to recognize the alignment. In 
biology, these probabilities arc defined to reflect observed statistics about sequence 
mutations and conservation. 

Fig. [1] shows an HMM capable of generating a pair of aligned sequences. When 
given two sequences to align, then a path from the initial state, begin, such that 
the model emits the two sequences, corresponds to an alignment. The initial state, 
begin, docs not emit symbols. The match state emits a pair of symbols (xi,j/j), 
one for each sequence corresponding to alignment of the symbol at position i in 
sequence x and the symbol at position j in sequence y. Emitted symbols can be 
identical or different. A difference represents a potential mutation between the two 
sequences. The insert state emits only the next symbol of sequence x, effectively 
aligning position Xi to a gap in y. Oppositely, the delete state aligns a symbol j/j 
to a gap in sequence x. 
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Fig. 1. A pair HMM for pairwise global alignment of sequences. States, represented 
by squares for emitting states and circles for silent states, are connected by arrows 
representing transitions labeled with probabilities. 

The following example shows an alignment of two short protein sequences, where 
the third line indicates the state sequence of this alignment abbreviated with the 
first letter of the state name: 



Sequence x 
Sequence y 
alignment 



HGKKGA AQV 

KGPKKAQA 

bi i immmddmmm 



In this context, a common task is to find the optimal alignment. This means to find 
a state sequence that can recognize the two sequences and has maximal probability. 
Another is to calculate the probability to observe an emission sequence. A third 
type of inference is parameter learning, where we arc given a set of alignments and 
estimate the "best" parameters for the model, where best usually means that they 
maximize likelihood of the alignments. 

3 A constraint model for HMM with side-constraints 

In this section, we give a formal definition of CHMMs and propose a constraint 
model for CHMM runs. Then, the computation of the most probable path is adapted 
for CHMMs. 



3.1 Constrained Hidden Markov Model 

A CHMM extends an HMM with constraints that limit the set of valid runs and 
leave fewer paths to consider for any given sequence. 

Definition 3.1 

A constrained HMM (CHMM) is defined by a 5-tuple (5", A, T, E, C) where (S, A, T, E) 
is an HMM and C is a set of constraints, each of which is a mapping from HMM 
runs into {true^ false}. 

A run of a CHMM, {path, observation) is a run of the corresponding HMM for 
which C{path, observation) is true. 

Notice that we define constraints in a highly abstract way, independently of any spe- 
cific constraint language. In the following, constraints over finite domains ( [Van Hentenryck et al. 1995[ ) 
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are used, although other constramt languages such as CLP{Q) and CLP{R) could 
have been used as well. 



3.2 Runs of a CHMM as a constraint program 

In this section, we propose to model runs of CHMM by a constraint program over 
finite domains. In this context, a run of CHMM is a solution of the constraint 
program. 

Let (S*, A, T, E, C) be a CHMM and n the sequence length. A constraint program 
for runs is given by the following predicate. 

ran([sW,5i,...,5„],[Si,...,£;„]) 

where each variable Si and Ei represents the state and the emission at the step i. 
The domains of Si and Ei, are given as dom(5'i) = S'\ {sq} ^^nd dom(£'i) = E. The 
run predicate is specified as follows. 

run{[s^'^\Si, . . . ,S'„], [Ei, . . . ,£'„]) is true iff 

3s(i' e dom(S'i), . . . , 3s("' e dom(S'„) and 

3eM^ e dom(£;i), . . . , Be^") G dom(£;„), 
C(s(°)s(i) . . . s("), e(i' . . . e(")) is true, s(°) = sq and 

p(s(°);s(i))-p(.s(i);e(i))...p(s("-i);s("))-p(s*"^e("))>0. (1) 

Formula ^ states that s'^^h'^^^ . . . s^") and e'^^ . . . e*") is a run of the HMM that 
satisfies C. By the definition of run/2, (local) relationships between Si and 5',;+i 
and Si and Ei can be established, since the probability of a run must be positive. 
Indeed, valuation of Si to s'*) and Si+i to s'*+^'' can be part of a solution of 
the constraint program whenever p(s^*^; s^'+^') > 0. These relationships between 
variables of run/2 are modeled by the following constraints, 

trans(Si-i, Si) and emit{Si, Ei),ioT alH, 1 < i < n 

where Si, Si^i and Ei are the variables of run/2. These constraints are defined as 
follows. 

• trans{Si, Si+i) is true iff Bs''' e dom(5'i) and s^'''^^^ e dom(5'i+i) such that 

• emit{Si, Ei) is true iff 3s''' e dom(S,) and e''' G dom(_Ei) such that p(s'*';e*'') > 
0. 

Section |4] below shows an implementation of this framework such that a solution of 
the constraint program corresponds to a valid derivation of a PRISM program. 



H. Christiansen et al. 



3.3 Example: constrained pairwise global alignment 



We consider the HMM presented in section [2?2] and extend it into a CHMM by the 
following set of constraints, 

C = {cardinality_atmost(A^d, [Si, . . . , Sn], delete), 

cardinality_atmost(A'^i, [Si, . . . , Sn], insert)} . 

A constraint cardinality^atmost(7V, L,X) is satisfied whenever L is a list of ele- 
ments, out of which at most N are equal to X.lna. biological context, it is reasonable 
to consider only alignments with a limited number of insertions and deletions given 
the assumption that the two sequences are related. 

As described above, we can consider this CHMM as a constraint program 

run{[s^''\Si,...,S„],[Ei,...,E„]) 

where dom(S'i) G {match, delete, insert}, dom(£'i) e {A, C,D,. . . ,W, Yj^ and the 
constraints C are as described above. 



3.4 Computation of the most probable path for a CHMM 

The Viterbi algorithm (jViterbi 1967P is a dynamic programming algorithm for find- 
ing a most probable path corresponding to a given observation. The algorithm keeps 
track of, for each prefix of an observed emission sequence, the most probable (par- 
tial) path leading to each possible state, and extends those step by step into longer 
paths, eventually covering the entire emission sequence. Here, we adapt this algo- 
rithm for CHMMs. 

Consider a given observation e^^-* . . . e*-"-*, a CHMM {S, A, T, E, C), and its con- 
straint program 

™n([5("),5i,...,5„],[e«,...,e(")]). 

The most probable path is computed by finding the valuation s^^' , . . . , s*-"^ that 
maximizes the objective function: the probability of a run. 

Computation of the most probable path for CHMM is expressed as a rewriting 
system on a set of 5-tuples S. Each such 5-tuple is of form {s, i,p, it, a) where tt is 
a partial path ending in state s and representing a path for the emission sequence 
prefix e^^' ■ ■ ■ e'*'; p is the computed probability for the emissions and transitions 
applied in the construction of tt, and a is the current constraint store seen as a 
conjunction of constraints. Any ground and satisfied constraint will be removed 
from the constraint store, and true refers to the empty conjunction. The set of 
solutions of a constraint store a is denoted by so1((t). 

The two rewriting rules in Fig. [2] describe an iteration step of the computation 



^ This set of letters refers to the 21 different amino acids from which proteins are composed. 
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transMr : E := T.U {(s' ,i+l,p ■ p{s;s') ■ p{s';e'-^+^'>),n s' ,a A S,+ i = s')} 
whenever {s,i,p,TT,a) € E, p(s; s'),p(s'; e''^"*^^) > 
check_constraints(cr A 5,+! — s') and prune^ctr does not apply. 

prunc-ctr : E := E \ {(s, j + l,p', 7r',(7')} 

whenever there is another {s,i + l,p,n,a) £ E with 
p > p' and so1((t') C so1((t). 



Fig. 2. Rewriting rules for the computation of most probable paths for CHMM 

of the most probable patho The computation starts from an initial set of 5-tuples 

{(s(°', 0, 1, e, C A trans{s^°\Si)A 

/y trans{Si, Si+i) A f\ emit[S^,ei))}. (2) 

l<i<n-l l<i<n 

The trans-ctr rule expands an existing partial path one step in directions that 
preserve the satisfaction of the constraint store; this satisfiability check is denoted 
check_constraints (and depends thus on the particular C). The prune-ctr rule re- 
moves partial solutions that are not optimal for the current observation prefix and 
shares the same set of complete solutions with the better partial solution. The 
second condition is necessary in case no partial path contained in sol(cr) can be 
extended into a full path without violating the constraints. We take the following 
correctness property for granted. 

Proposition 3.1 

Assume a CHMM H with the notation as above and an observation Ohs — e^^' ■ • ■ e^"'. 
When the Viterbi algorithm in Fig.[2]is executed from an initial set of 5-tuples given 
the formula ([2]), it terminates with a set of 5-tuples Y, final- It holds that 

• For any (s,n,p,Tr,true) G "E final, tt is a most probable path for Obs ending 
in s and with probability p. 

• Whenever there exists a path for Obs ending in s, T, final includes a 5-tuple 
of the form (s, n, p, tt, true) . 

Notice that all the variables of the constraint program are valuated when a fi- 
nal state is reached, and thus any final constraint store is equivalent to true (as 
trans-ctr prevents any inconsistent store to arise). 

The classical Viterbi algorithm is guaranteed to run in time linear to the length 
of the given sequence, whereas our algorithm may in the worst case run in expo- 
nential time; this may occur if prune-ctr cannot be applied at all. In other words, 
a representation of the constraint store that allows an efficient comparison as in 

^ When any reference to constraints and the constraint store are removed from Fig. (2] we have a 
compact representation of one iteration step of the Viterbi algorithm for HMMs. 
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"sol(cr') C sol(cr)" is essential for the practicability of our algorithm. On the other 
hand, for those problems that can be formulated as a CHMM with effective and 
efficient definitions of check_constraints and the comparison test, the S states may 
stay of a reasonable size. Notice that our algorithm is still correct if we use approx- 
imations of these tests, more specifically, check_constraints may occasionally return 
true when the correct answer is false and the opposite for the comparison. 

4 Implementation of CHMMs in PRISM 

After briefly introducing PRISM, wc propose a methodology to define CHMMs in 
this framework. 



4-1 A brief introduction to the PRISM system 



PRISM ( Sato and Kameya 2008 ) is a powerful system for working with probabilistic- 
logic models, based on an extension to Prolog with discrete random variables, called 
multi- valued switches. We illustrate this with a simple example HMM with two 
states sO and si. A switch declaration, 

values (x,0) . 

associates the named random variable x with a set of outcomes 0. Whenever the 
goal msw(x,X) is called from the program, then a probabilistic choice will be made 
unifying X with an element of 0. Switches can also be defined in a parametric form, 

values (emit (_), [a, b] ) . 7, symbol emission 
values (trans (_), [sO, si] ) . 7, state transition 

where each declaration defines a family of switches, one for each possible instance of 
emit (_) and trans (_) and each instance is given a distinct probability distribution. 
This parametrization can serve to model dependencies: in our HMM example we 
define the parameters to be the states sO and si (plus init for trans (_)), thus 
defining emissions and transitions for each state with the Markov property. Finally, 
we define a logic program to implement the probabilistic model, 

hmm(L):- run_length(T) , hmm(T, init ,L) . 

hmm(0,_, []). 

hmm(T, State , [Emit I EmitRest] ) :- 

T > 0, 

msw(trans (State) .NextState) , 

msw(emit (NextState) ,Emit) , 

Tl is T-1, 

hmm(Tl , NextState .EmitRest) . 
run_length(10) . 

Here, a derivation of the goal hmm corresponds to what we define as a run in sec- 
tion [5?T1 As shown by (jSato 1995|) . Prolog's traditional Herbrand model semantics 
generalizes immediately to a probabilistic semantics when probabilities are given 
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for each random variable (provided that a few restrictions are respected on how 
msw is used in the program). Thus a PRISM program defines a probabihstic model 
that provides a probability distribution for all goals that can be formulated in the 
program's logical language. PRISM assigns each possible derivation of a goal a pro- 
bability defined as the product of the probabilities of the selected switch outcomes 
of switches used in the derivation. Under normal conditions, it will be the case that 
the sum of probabilities of all possible derivations of such a goal is unity, but these 
conditions can be violated in a constrained model. If a program attempts to unify 
the stochastically selected outcome of a switch with some other value distinct from 
that outcome, this unification will fail resulting in a failed derivation. 

PRISM includes built-in mechanisms for efficient probabilistic inference based on 
tabling. During inference, once a probabilistic goal has been solved, its answers are 
put in a global table. Later calls to the same goal will simply lookup the answer in 
the table in constant time. PRISM utilizes this to provide an efficient generalized 
Viterbi algorithm that may be used for the computation of the most likely successful 
derivation for a large number of probabilistic models including HMMs. PRISM also 
includes similar utilities for calculating the probability of a derivation or set of such 
and machine learning algorithms which produce the most likely probabilities for 
switch outcomes in order to explain a set of observed goals. 

4.2 A framework for CHMMs in PRISM 

We have implemented a framework for integration of side-constraints in a PRISM 
program|3 The framework has been used for adding constraints to HMM based 
models, but it should be possible to extend to other kinds of models. The underlying 
idea is that the program is augmented with a constraint store and a constraint 
checker goal is inserted in a few strategic places of the PRISM program. This 
constraint checking is the direct implementation of chcck_constraints of trans_ctr. 
The prune-ctr implementation is not discussed as we use the tabling mechanism of 
PRISM to prune the search space. 

4-2.1 Integration of side- constraints in a PRISM program 

This section describes how our framework can be integrated in a PRISM program. 
As an example, we consider an implementation of the HMM from the previous sec- 
tion. Below the central recursive predicate of the implementation is shown extended 
with constraint checking, 

1 hmm(T, State , [Emit lEmitRest] ,StoreIn) :- 

2 T > 0, 

3 msw (trans (State) ,NextState) , 

4 msw(emit (NextState) ,Emit) , 

5 check_constraints( [NextState, Emit] ,StoreIn,StoreDut) , 

^ The current implementation of the framework is available via http://akira.ruc.dk/~cth/chmm 
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6 Tl is T-1, 

7 liiiim(Tl,NextState,EmitRest,StoreOut) . 

Integration of side-constraint checking is done by extending relevant predicates 
with an extra parameter (Storeln.StoreOut in the code above) to accommodate 
a constraint store and a call to the check_constraints goal (line 5), after each 
distinct sequence of msw applications. 

If check_constraints fails during PRISM inference, then the corresponding 
PRISM derivation fails, and further extensions of this derivation will not be at- 
tempted since it does not constitute a valid run. In effect, inference by PRISM 
will only consider runs which are guaranteed not to violate any of the constraints 
declared for the model. 

Declaration of constraints and implementation of constraint solvers are concep- 
tually decoupled from the PRISM model. The declaration of side-constraints on the 
model is done by declaring facts of the form, constraint (ConstraintSpec). The 
ConstraintSpec associates the constraint with a constraint checker implementa- 
tion and may contain some parameters for this particular instance of the type of 
constraint. 

A satisfiability checker maintains its own constraint store. A satisfiability checker 
for a particular type of constraint consists of an init_constraint_store/2 rule 
and one or more check_sat/4 rules. The init_constraint_store/2 rule is used to 
create a starting point for the constraint store of each declared constraint and is of 
the form, 

init_constraint_store (ConstraintSpec , InitialStore) . 

It is given ConstraintSpec and must unify InitialStore with an initial constraint 
store matching the ConstraintSpec. Additionally, one or more check_sat rules of 
the form, 

check_sat (ConstraintSpec, StateUpdate ,StoreBef ore, StoreAfter) :- . . . . 

must be implemented to check the satisfiability of the constraint. 

As an example, consider an implementation of a cardinality^atmost constraint, 
called cardinality in our framework, 

init_constraint_store (cardinality (_,_) , 0) . 

check_sat (cardinality (U, Max) , U, Visitsln, VisitsOut) :- 

VisitsOut is Visitsln + 1, VisitsOut =< Max. 
check_sat(cardinality(X,_) ,U,S,S) :- X \= U. 

Each time check_constraints is called from the PRISM model, the relevant 
check_sat goals are called for each declared constraint. If any of these fails, so will 
check_constraints. StateUpdate and StoreBef ore are given and check_cons- 
traints is expected to unify StoreAfter to an updated constraint store. In our 
example HMM, the StateUpdate will consist of the [State, Emit] pattern given 
to check_constraints. 

The call to this rule must only succeed if the constraint given by ConstraintSpec 
is not violated by the further information given by the StateUpdate. Constraints are 
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checked incrementally and should only fail if any further updates to the constraint 
store can only lead to failure. 

The constraint stores of individually declared constraints are automatically aggre- 
gated in the constraint store exposed to the PRISM model. Individual constraint 
checkers are unaware of each other and cannot access the individual constraint 
stores of other constraint checkers. The constraints are checked in the order they 
are declared, so this order should be optimized to do pruning as early as possible. 

4-2.2 Efficient inference with a separate constraint store stack 

The tabling mechanism in PRISM makes Viterbi computation and EM learning 
efhcient, but when extra parameters such as the constraint store are introduced in 
the probabilistic goals, PRISM considers these as goals with distinct derivations 
and stores a tabled entry for each version of the goal. This behavior is undesired 
when the extra parameters are used only for internal bookkeeping. The effect of 
this excessive tabling is that the dynamic programming advantages are lost with 
exponential time inference as consequence. 

In ( [Christiansen and Gallagher 2009| a related problem concerning tabling of 
annotations produced by running Viterbi on PRISM programs is approached using 
a program transformation that removes non-discriminating arguments, which do 
not affect the control flow. The annotation can then be recovered from the program 
derivation of the transformed program. 

This approach is not applicable for the constraint store argument because the 
constraint store implicitly affects control flow by limiting possible future deriva- 
tion extensions. The constraint store has to be considered in the inference process; 
otherwise it would be possible to produce invalid derivation paths. 

B-Prolog, on which PRISM is based, supports table modes, but this is not directly 
usable with probabilistic goals in PRISM. It is possible with these modes to declare 
an argument of a tabled goal as an output argument, which means that it will not be 
used as key in the table lookup, but will be unified with the value of the argument 
stored in a tabled goal. For our purpose, declaring the constraint store arguments 
as output arguments would not be feasible since different derivations of the same 
goal may have differing constraint stores and these determine possible derivation 
extensions. 

To deal with the tabling problem we have introduced a separate constraint store 
stack, which avoids storing data locally in parameters of probabilistic goals by 
maintaining the constraint store with assert and retract. This stack is maintained 
in parallel to the derivation stack of Prolog. PRISM utilizes Prolog's backtracking to 
explore possible solutions, so the constraint store stack implementation is required 
to be able to restore a previous constraint store when PRISM encounters failures 
during inference and performs backtracking to find alternative solutions. 

To utilize this functionality, the user should use the goal check_constraints/l, 
which omits the store arguments, rather than check_constraints/3 as stated 
above. We then define check_constraints/l as 

check_constraints (StateUpdate) : - 
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get_store(StoreBef ore) , 

check_constraints (StateUpdate , StoreBef ore , StoreAf ter ) , 

f orward_store(StoreAfter) . 

The new check_constraints/l make use of the goal get_store/l to retrieve the 
current version of the constraint store and f orward_store/l is used to assert the 
updated store, 

get_store(S) :- !, store (S). 

f orward_store(S) :- (asserta(store(S) ) ; retract (store(S) ) ,f ail) . 

If a derivation fails, PRISM backtracks to the choice point in the f orward_store 
rule and retract the most recently asserted store. Then, when exploring alterna- 
tive derivation extensions, the previously asserted constraint store will be used as 
expected. 

4- 2. 3 Complexity analysis of our implementation 

Due to tabling, PRISM guarantees familiar best known complexity bounds of com- 
mon inference tasks on a variety of the models that can be expressed in PRISM, 
which includes HMMs (jSato 2000[) . This implicitly limits the number of calls of 
check_constraints to the same bound. The added complexity of doing constraint 
checking depends on incremental constraint checking cost of individual constraints 
checkers and the number of constraints expressed on the model. 

Space complexity is influenced by table space usage and maximal length of a 
derivation at any given point. Since the asserted constraint store stack contains a 
constraint store fixpoint for each step of the current derivation, it is bounded by 
0(nmax(|c|)) where n is the length of the sequence and max(|c|) is the maximal size 
of the constraint store in any derivation step. Note that the space complexity of the 
separate constraint store stack is unaffected by time complexity and the number 
of states in the model. With more complex models like the pair HMM, the table 
space required for dynamic programming becomes the dominating concern. 

5 Experimental validation 

In this section, we validate our CHMM implementation with the pair HMM pre- 
sented in section [2.21 The experiments were run on a computer with 16 2.4 GHz, 
64 bit Intel Xeon(R) E7340 CPUs and 64 GB of memory. All of the experiments 
utilized only a single processor at a time. 

Our experiments utilize implementations of some common constraints adapted for 
the CHMM framework: cardinality (UpdatePatterns, Max) ensures that entries 
from the list UpdatePatterns occurs at most Max times in the derivation sequence, 
alldif f ensures that all updates in a derivation are different; lock_to_sequence(Seq) 
ensures that the sequence of derivation updates is identical to the sequence repre- 
sented by the list Seq; lock_to_set(Set) ensures that all updates belong to mem- 
bers of the list Set. The operator f orall_subseq(L,C) applies the constraint C to 
every subsequence of length L in the derivation sequence and f or jrange (From , To , C) 
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applies C only the range, To-From, both mclusive; state_specif ic(C) applies C 
only to the State part of the update. 



5.1 Running time of constrained alignment 

The addition of side-constraints to an HMM involves some computational overhead 
in order to check the satisfiability of the constraints, but may also reduce the number 
of possible solutions and therefore the amount of work required to find the optimal 
path. As a practical experiment to demonstrate this, we consider global alignment 
with the pair HMM discussed in section [221 

The overhead of integrating the constraint checking machinery in the model 
is demonstrated in the left part of Fig. |3l where sequences of increasing length 
are aligned. It can be observed that the running time penalty is a constant fac- 
tor and that the polynomial time complexity of the pair HMM is preserved in 
our framework. Obviously, polynomial time inference presupposes incremental con- 
straint checking to be a constant time operation, which may not be the case for 
certain types of constraints. 

In the right part of Fig.[3J two sequences of equal length (32) are aligned, but with 
varying amounts of constraints being enforced. The global cardinality constraint 
is used to enforce an upper limit, L, on the amount of inserts or deletes in the 
alignment, 

constraint (state_specific(cardinality( [insert , delete] ,L))) . 

By constraining the alignment (allowing fewer gaps) , the space of viable solutions 
is reduced. The more constrained the alignment is, the more pruning opportunities 
arise. With a large amount of pruning opportunities, the running time is reduced 
quite significantly. Note that, since the imposed constraint is state_specif ic, the 
number of possible alignments, and hence running time, is unaffected by input 
sequence structure. 



constrained pair hmm (no constraints entorced) - 
pure pair limm - 




sequence iengths 



aliowed insertions/deletions 



Fig. 3. Left: Running time of alignment with a pure pair HMM compared to 
alignment with a CHMM with no constraints enforced. Right: Running time of 
alignment of two sequences of length 32 with varying amounts of allowed insertions 
and deletions. 
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5.2 Efficiency of the separate constraint store stack 

To verify the cfBciency of our constraint store implementation, alignment with a lo- 
cal cardinality constraint was measured for different sizes of input sequences. From 
the measurements, which are reported in Fig. |4l it is apparent that our implemen- 
tation does not incur the same exponential overhead as the naive implementation 
where the constraint store is maintained in the goals and hence tabled. 





constraint store in goals . 

separate constraint store 


100000 


" 1 


10000 


- 


1000 


- 


100 




10 


1 '''' 


1 


f : 



10 20 



30 40 50 60 70 
length of aligned sequences 



10 20 30 40 50 60 70 
length of aligned sequences 



Fig. 4. A comparison of the running time (left) and memory usage (right) of con- 
strained alignment of two sequences with tabled constraints versus a separate con- 
straint store stack. 

Running times and memory usage for a range of different constraints are reported 
in Table [TJ For the sake for completeness, the table also includes running times for 
the version where the constraint store is tabled. 



Constraint 


Sequence 
lengths 


Running 

Time (in ms) 


Memory 
consumption (in kb) 


in goals 


separate 


in goals 


separate 


cardinality ([insert] ,20) 


50 


15460 


3176 


42296 


5723 


cardinality ( [insert] ,40) 


50 


29557 


3968 


93845 


6703 


for_range(l,50, 
lock_to_set ( [match] ) ) 


100 


24649 


4544 


105498 


7137 


for_range(l,90, 
lock_to_set ( [match] ) ) 


100 


20 


48 


1641 


1198 


for_range(l,50, 

lock_to_sequence( [match, . . .match] ) ) 


100 


24829 


4544 


1641 


1198 


for_range(l,90, 

lock_to_sequence( [match,.., match])) 


100 


20 


48 


105498 


7137 


alldiff 


20 


100442 


28 


85654 


256 


forall^ubseqs(5,alldiff) 


10 


1664 


12 


60098 


137 



Table 1 . Running time and memory consumption for alignment with different kinds 
of constraints. 



In most cases the separate constraint store performs better in terms of both 
running time and memory consumption. In the cases where performance is worse, 
it can be attributed to a very small number of possible derivations or constraints 
which rarely change the store. 
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6 Related work 

The term "Constrained HMM" is used in (jRoweis 19991 ILandwehr et al. 2007^ and 
refers to restrictions on the finite automaton associated with an HMM but not as 



constraints on HMM runs. In ( Sato and Kameya 2008 1 , CHMMs were introduced to 



exemphfy an EM algorithm, suited for PRISM programs which ahow the possibihty 
of derivation failures. Our approach differs, as we augment PRISM programs with 
side-constraints and use constraint solving techniques to achieve efficient inference. 

In (jRiezler 1998| . Riezler proposes techniques for inference in probabilistic con- 
straint logic programming. In (jCosta et al. 2008|) relationships between elements of 
a Bayesian Network are expressed as a constraint logic program, which is similar to 
the way we define HMMs. However, our paper focus differs as we study the interest 
of checking satisfiability of side-constraints during inference. 

In the natural language processing community, recent work on Constrained Con- 
ditional Models feature an approach similar to ours. Indeed, Constrained Condi- 
tional Models is a general framework that augments inference and learning of condi- 
tional models with declarative constraints ( [Chang et al. 200"8l ). However, inference 
is expressed as an Integer Linear Programming problem (jRoth and Yih 2005p . In 
this context, more expressive constraints, such as cardinality or all_dif f erent, 
can not be added on an HMM run. Moreover, our PRISM-bascd implementation 
allows us to define the HMM structure separately from the side-constraints and use 
advanced constraint solving techniques. 



7 Conclusions 

In this paper, we propose a framework to define HMMs with side-constraints as 
a Constraint Logic program extended by probabilistic choices. Constraint Logic 
Programming have advantages in terms of more compact expression of CHMMs. 
Inference computations are adapted for CHMMs and conditions for an efficient 
computation arc described. An implementation based on PRISM is proposed and 
well-known constraints and operators have been demonstrated for defining CHMMs. 
Finally, we experimentally validate our approach with a constrained pair HMM used 
for biological sequence alignment. 

As current work, we study how sampling and EM-learning can be adapted for our 
CHMM framework. Indeed, sampling turns out to be problematic in probabilistic 
models with a large probability of derivation failure. In (jSato et al. 2005p . Sato et 
al. address the problem of EM-learning with PRISM programs that can fail and 
their methods are also applicable for our framework. 

As further work, we plan to incorporate more advanced constraint solving tech- 
niques such as those used in Weighted CSP (jLarrosa and Schiex 2004|) in the frame- 
work. This approach would allow us to combine soft constraints solving and in- 
ference and express this as an optimization problem. We also plan to deal with 
the restriction that individual constraint checkers do not share information in our 
framework, so that we can benefit from some of the optimization techniques used 
by other constraint solvers. We are working on extending the library of constraints 
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that can be defined as side-constraints. 
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